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1. INTRODUCTION 



Statistical methodology has long been a familiar tool for 
use in understanding our natural environment. Classical examples 
of applications of statistics are seen in weather forecasting, 
in evaluation of attempts at weather modification by cloud seed- 
ing, and in descriptions of the fluctuations in the sea surface. 

Now the accessibility of new and extensive data from a variety 
of remote sensing sources, such as earth orbiting and geostationary 
satellites, again calls for the development and application of 
appropriate statistical methodology. Classical methods of statis- 
tics and of probability modeling frequently must be adapted to 
the new needs. The process of adaptation will proceed most 
efficiently if statisticians work cooperatively with the scientists 
actually obtaining data and studying the associated natural 
phenomena. Conferences such as PRIMARS I are of great value in 
promoting the necessary interchange of information and the stimulus 
to approach novel and difficult problems in a realistic manner. 

This paper describes new approaches to the analysis of 
data, in particular to quite "noisy” data of the sort that is 
likely to be encountered when observing the natural environment. 

The descriptions given will necessarily be brief, but an attempt 
will be made to show how the methods and viewpoints presented may 
be applied to problems arising in remote sensing. 
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2. ROBUST METHODOLOGY; REQUIREMENTS AND POSSIBILITIES 

Many scientists who have closely examined real data 
have encountered occasional, or even frequent, anomalous behavior. 
Apparent anomalies in data may be with respect to either 

(a) preconceptions as to "proper" data behavior, these perhaps 
being buttressed by (physical) theory, or 

(b) the nature of the general pattern of the data, especially 
those data points in the immediate neighborhood, e.g., 

in time or space. 

2.1. Plots 

In simple circumstances graphical plots will quickly reveal 

those points that are blatant anomalies. For instance suppose that 

one wishes to investigate data concerning the relationship between 

wind velocity and whitecap cover in the ocean. Theory may suggest 

a specific relationship, e.g. that white-cap cover, C, be nearly 

a cubic function of wind velocity v, so that it will be tempting 

to plot C vs V and note an appearance as shown on Fig. 1; there 

solid black dots represent (simulated) raw data. Since the eye 

2 

finds it difficult to distinguish curves of the form C = cxv , 

3 7/2 ... 

C = av , C = av ' , etc., -^roir. one another, and yet is sensitive to 

3 

departures from linearity, a graph of C V£ v suggests itself, bul 
is not included here. A plot on log-log paper may be still better. 
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As presented, the data conforms in general to the theorized relation- 
ship or scaling, with the obvious exception of the circled point to 
the right. Such an anomalous point, or points, represents a challenge 
both to statistical technology and to the ultimate user of the data. 

Statistical technology assumes the responsibility for revealing 
the presence of such points, and, if possible, for providing a 
meaningful and useful summary of the remaining points. It falls 
to the consumer or ultimate user of the data, preferably with 
the help of a subject-matter specialist (physicist or oceanographer) 
to interpret the apparently anomalous maverick — or exotic, or 
outlying — data point: is it 

(i) an evidence of the failure of the relation C = av^, say 
for large velocities, 
or is it 

(ii) an outright error in data recording, and to be disregarded? 

being just two possible options. 

Note that simple graphs are invaluable for pointing out 
extreme outliers in simple, one explanatory variable, situations. 

If more variables are required, informative plots are more diffi- 
cult without the use of more statistical technology. We next 
I show that classical, least-squares, technology may be quite mis- 
leading, but that replacements are available. See Mosteller and 
iTukey (1977), abbreviated OT hereafter. 
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2.2. Fits and Residual Plots 



Suppose that one wishes to summarize data such as that 

3 

in Fig. 1 by fitting the relationship C = av , i.e., determining 
the parameter a from the data. The classical and automatic way 
of doing so is to apply least squares; computer programs are uni- 
versally available, even for handheld calculators (the TI 59, or 

HP 67) . What are we likely to find? A least-squares line (treat- 
3 

ing w = V as the independent variable presents C = aw; one 
can also plot and fit = (a)^'^^v, and there may be reasons 

for this choice) is quite apt to fatally misrepresent the situation, 
responding much too sensitively to the single (here encircled) 
outlying value, and straying systematically away from the main 
body of the data; see the points represented by o in Fig. 1. 

An alternative method for fitting, described in 1^, is 
less susceptible to outlier influence — is far more robust to 
departures from basic assumptions — than is the ordinary least 
squares (OLS) method. This new method', termed biweight fitting , 
is carried out by a procedure that uses the OLS computation itera- 
tively. In the course of the computations weights are auto- 
matically developed that reduce the influence of the encircled 
value of Fig. 1, permitting the fit to more closely approximate 
the main body of the data. We now describe and illustrate the 
biweight fitting procedure as it is adapted to the problem of 
determining the parameter a in the relation y. vs ax.. 
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Biweight Fitting Calculation 

(1) Compute the kth (k = 1,2,3,...) iterative estimate of a, 

(k) 

denoted by a by solving 



n 



(k) . (k-1) 

a X . ) X . w . 

1 11 



0 , 



to obtain 



n 



(k-1) 



a 



r I 

(k) i=l 



2. (k-1) 



) X . w . 

iii ^ ^ 



(k-1 ) 

(2) the weights, w^ , are of this form: 



,(k-l) 

i 





7- 



if (•) < 1 



=0 it (•) > 1 

where (•) refers to the term ; 

(k-1) 

S is a scale factor (robust replacement for the 

standard deviation) that may be computed in the following manner . 

s t 

(3) The k-1 iterated value of the scale factor is 



s(k-i) 



median { I y^ - ^^x^l). 



c being a constant of value 6, or 9; 
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(4) the first value, of the iterative sequence can be 

obtained by equalizing all weights =1), which is 

equivalent to OLS ; alternatively, one can utilize a "robust 
start," suggestions for which can be found in MT . 

The iteration is carried on until the difference between successive 
values is small; usually 4 to 8 iterations is sufficient. The 
resulting a-estimate can be denoted by a. 

Following the fitting it is informative to plot the 
residual values: 



r . = y ■ “ ax . = y • ~ y • 

1-^1 1-^1 -^1 



i — 1,2,. ..,n. 



y^^ being shorthand for the predicted y value. In case there 
is a single outlier, as in Fig. 1, the fitted line will tend to 
hug the major point cloud, and a histogram of the residuals will 
dramatically reveal the presence of the outlier, suggesting 
further investigation. A plot of r^^ ^ y^ is also useful. See 
MT for further suggestions . 



2.3. Numerical Illustration 

The following are a set of (simulated) whitecap percentages 
and corresponding wind velocities. Alongside are values for 
white cap coverage estimated by OLS and by the biweight procedure. 
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Velocity 



2 

7 

10 

15 

18 

21 

24 



Cover 

("Actual") 



Cover 

(OLS Estimate) 



Cover 

(Robust Estimate) 



0.011 

0.63 

1.30 

3. 89 

2.89 
8.16 

25.7 



0.011 

0.49 

1.43 

4.82 

8.33 

13.2 

19.7 



0.0067 

0.29 

0.84 

2.83 

4.88 

7.76 

11.6 



It is clear from the above table, and perhaps clearer from Fig. 1, 
that the OLS solution, in its attempt to fit the point C(24) = 25, 
systematically and considerably over-estimates the points at v = 15 
and above. The biweight estimator performs much better, allowing 
a closer fit to all data other than C(24) . A residual plot brings 
attention to bear on that point. 

Since the values of "Actual Cover" were actually constructed 

3 

by forming O.OOSv and adding Gaussian random noise with value pro- 

3 

portional to C(v), and since the sequence of values of O.OOSv 
were 0.0064, 0.27, 0.80, 2.7, 4.67, 7.41, 11.06, we cannot fault 
the manner in which the biweight procedure functioned in this 
example and are encouraged to use it more widely. 
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2.4. Possible Application to Remote Sensing Data 

In a paper in this conference proceedings by Depriest 
(1979), and in Fleming (1979), a problem arising from partial 
cloud cover contamination of remote sensing data is described 
and addressed. This problem has the following origin. A series 
of measurements are made on a physical quantity (sea surface 
temperatures) but are contaminated. That is, in the case of 
sea surface temperatures, if no clouds are present the measure- 
ments are approximately normally distributed around y (the true 
temperature) . However, if clouds are present a fraction of the 
measurements are made artificially smaller, cloud temperatures 
being lower than those at earth surface. The problem is to esti- 
mate y. Techniques for doing so are described by Depriest (1979) 
and by Fleming (1979). We describe a possible alternative approach 
that uses robust regression. Operational characteristics of 
the two procedures have not yet been compared. 

(1) Arrange the measurements in order: Y-, < Y-y < y < ♦•• < 

X 2 3 

^ The largest observations may well appear 

similar to the largest order statistics of a norm.al distri- 
bution with (unknown) mean y and standard deviation a 
(sometimes assumed known/ although caution is in order) / 
while the smaller ones are likely to depart systematically. 

(2) Carry out a preliminary plot of 

Yk ^ ' k = n, n-1, n-2, ... 
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where <I> ^ (p) is the inverse function of the unit normal: 
recall that if 

A / \ / 1 2 , dz 

(y) = J exp (—5- z ) ■ , 

^ /2tt 

is the unit normal distribution, then the solution of the 
equation <I> (y) = p gives 

y (p) = ^ (p) ; 



4 *, and hence ^ are widely tabulated. Alternatively, 
use Arithmetic Probability Paper. If y^^ is an ordered 
observation from a normal population, then the plot should 
appear straight, while a systematic departure from linearity 
indicates a departure from normality. Suppose departures 
begin to occur at k = D; sometimes D may be greater than 
n/ 2 . One may first eye-fit a straight line to the points 
k = n, n- 1 , ... , D. Then Yyi /2 ~ ^ (estimated temperature), 

i.e. the value of the fitted line at n/2 should give a 
reasonable value for y. 

( 3 ) Going further in a formal direction, one may wish to fit a 
line to the data points. Here a biweight fit should behave 
well, tending to be oblivious to spurious (cloud contamina- 
tion) points. One can proceed to fit the relation 



with 



yj^ V£ y + aXj^ 



"k - ' 



k = n , n-1 , n-2 , ... ; 
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a start using the eye-fit to points k = n, n-1, ... , D 

may be worthwhile. Finally, quote the estimate 



- med - 2 ^^ 0/2 ^{n/2)+l^ 



n even ; 



/N 

" ^(n+l)/2' 



n odd . 



where 



> P + ox^ . 

The above procedure seems worth further investigation and refine- 
ment. One important step may be to adjust for the effect of 
correlation between order statistics when carrying out the 
regression . 
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3. SMOOTHING DATA 



If one plots certain environmental data, e.g. monthly 
total rainfall, or perhaps daily maximum temperature, at a 
particular location, systematic regularities seem to appear, 
but may be masked by noise. Often there is a seasonal pattern, 
i.e. one that is roughly cyclic in nature. Attempts to fit such 
a pattern with polynomials is doomed to failure, and selection 
of a set of sines and cosines that does well (Fourier series) 
may lead to many terms. Some method of smoothing the original 
series that lays bare the regularities is to be desired. After 
such is made available, one can study the residuals around it. 
Spectral analysis or some such formal procedure may then be of 
use. 

Classical smoothing procedures involve some form of moving 
average, and are susceptible to the python-swallowing-the pig 
difficulty: imagine using the linear smoothing operation 



on the y^ series 



Ry. 



1 

11 

£ 

(T; 



2 

9 

9 

9 



3 

7 

8 
8 



SYt = 



Yfi + Yt + y 



t+1 



4 5 

8 8 

7.7 ^ 

8 8 



6 

29 

10 



7 8 

10 8 

10 8 



9 

6 

7.7 

8 



10 

9 

® 
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clearly the entry of 29 at t = 5 into the smoothed value 
series Sy^ gives a serious distortion, travelling as it does 
in partially digested form through the next two terms of the 
smoothed series. If further smoothing is attempted the bulge 
is reduced slightly, but spreads out in time. 

On the other hand, the non-linear operation of taking 
running medians, as suggested by Tukey (1977)/ performs effec- 
tively (in both cases circled and values are copied from the 
original series; more sophisticated procedures can be invented 
as well) . The last row in the table, labelled Ry^/ gives the 
result of this robust smoothing; note that it behaves in an 
intuitively appealing manner, essentially ignoring the outlying 
value 29. Further steps can be taken to improve the "smooths," 
but we refer to Tukey (1977) Chapters 7 and 16 for details. 

Two furthet points may be made. The first is that the 
analysis of a sequence of data points, and their projection or 
forecasting in space or time, should not end with providing a 
smoothed or averaged version. Examination of the remaining vari- 
ation, e.g. the sequence y^ - Ry^/ called the "rough" by Tukey 
may well be rewarding; presence and suggestiveness of various 
outliers is much more evident in the rough (residuals) sequence 
than in the original sequence of data points. Secondly, the 
procedure described for smoothing simple sequences of data points 
must be adapted to planar (two-dimensional) data; some work has 
been done, but much remains. 
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4 . STOCHASTIC MODELING OF ICE PRESSURE RIDGES 



The dynamics of ice formation in the Earth's cold regions 
results in the development of irregular ice pile-ups, or pressure 
ridges. These ridges occur in an apparently random fashion in 
space; in fact the following regularities are observed by remote 
sensing methods (courtesy of Dr. W. Weeks, in a seminar at the 
Naval Postgraduate School, Monterey, California, Winter, 1979; 
see also Weeks, et al. (1979)); 

1) Along a sampling line (e.g. airplane flight path, or straight 
submarine track) ice ridges seem to appear in accordance 
with a stationary Poisson process, so if R(x) is the number 
of such ridges encountered over a distance x, then approxi- 
mately 

P{R(x) = n} = e ^ ■■■ , n = 0,1,2,... 

Xl • 

where X > 0 is the density of ice ridges. 

2) The probability distribution of ridge "sail heights" (or 

"U V 

"keel depths") may be approximated by the forms F(y) = 1-e ^ , 

2 

or 1-e ■* ; the best-fitting distribution may well depend 

upon the method of observation (averaging properties) . 

For further details see work referenced in Weeks et (1979) . 

Now it may be of interest to compute the distribution 
of the maximum sail height, or keel depth, that one is to encounter 
over a course of length x. This is very simple, given the 
particular distributions of-sail number and size and furthermore 
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assuming independent between ridge heights. Let H(x) be the 
maximum sail height; then 



P{H(x) < y} = I e~^^ ~T - [F(y)]^ 
n=0 

since all of the Poisson-distributed heights must be less than y 
in order for the maximum to be below y. 

Sum out to obtain 

p{H(x) <_ y} = exp{-Xx [1-F (y) ] } 

Depending upon which distribution is picked for ridge heights, 
we get 

a) P(H(x) <_ y} = exp(-Xxe 

_ 2 

b) P{H(x) <_ y} = exp(-Xxe ) 

These closely resemble classical extreme value distributions. 

Note that if logs are taken simplicity occurs: 

a') Jin p{H(x) ^ y} = -Xxe 

)ln(-Jln P{H(x) £ y}) = £n(Xx) - yy 

_ 2 

b') Jin p{h(x) £ y} = -Xxe ; 

Jln(-Jln P{H(x) £ y}) = iln(Xx) - vy^ . 



14 



If either of these formulas are to be used for practical purposes, 
values of the parameters must be obtained. In order to estimate 
parameters X, y, v in the above models from data one naturally 
thinks of the method of maximum likelihood. Suppose that we 
have observed R(x) = n ridges of heights ^ 2 ’ ' ^n' 

Then the maximum likelihood estimates are 




where as usual we have put 




- I 

n 



n 



i=l 



Hence our estimates are of the form 

a”) est Jln(-Jln P{H(x) £ y} ) = Jin X + fi.n x - yy 

/s /N p 

b”) est £n(-Jln P{H(x) £ y} ) = Jin X + 5-n x - vy 

If rather large samples are available and if distributional assump- 
tions are well satisfied one may feel comfortable with conventional 
standard errors based on Fisher information and normality; see 
Crame^r (1946). On the other hand, it is of interest to apply 
the jackkni fe technique (see R. G. Miller (1974) for a review) to 
obtain estimates of the variance of estimate due particularly to 

the ridge heights. To carry out the calculation, (i) compute 
^ ~2 

= n/y ; then (ii) compute 



n-1 

T 



( “ * ) ”'2 2 2 2 ” 

^ Yi + y^ '+•••+ y . T + 0 + V.,, + 

-^1 ^2 "D+i 



n 
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for j = 1,2,..., n; then (iii) compute the pseudovalues 

>N /V /V 

V. = ~ (n-1) and (iv) average to obtain a jackknifed 

^ rn ^ 

point estimate = (1/n) Zj_]^ variance 






JK 



Then we can estimate the standard error of the probability predictior 
e.g. b") by computing 



S.E. = (Var[est £n(-£n P{H(x) <_ y) ) ] ) 




A similar calculation is easily performed for model a) ; details 
are omitted. From the above results, approximate confidence inter- 
vals may be constructed for the probability of encountering a 
(maximum) ridge sail height less than y in magnitude. 

Fairly recent theoretical results of Efron and Hinkley 
(1978) suggest that if a traditional maximum likelihood approach 
is taken, one is better off using observed Fisher information 
rather than expected Fisher information in order to establish an 
approximate standard error in either case a) or b) . However, 
work of Reeds (1978) suggests that use of the jackknife in 
conjvmction with maximum likelihood yields results that tend 
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to be rather independent of the basic model chosen. Both of 
these suggestions must be validated by further work, a good 
deal of which will necessarily involve Monte Carlo simulation. 
Such work should be of great importance and interest to those 
who must assess the probabilities of extreme, rare, events, and 
who furthermore wish to provide some reasonably valid estimates 
of the error of their estimates. 

Acknowledgment . The writer is much indebted to LCDR C. F. 
Taylor, Jr., for his assistance in example robust regression 
computations. He is also indebted to the Office of Naval 
Research for support of this research. 
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